Font Descriptor Construction for Printed Thai Character Recognition
نویسندگان
چکیده
The font evolution with various types is a great impact on a recognition performance of optical character recognition (OCR) systems. The more diversity of fonts leads to the less accuracy of recognition rate, particularly Thai-fonts. In order to overcome this obstacle, this paper proposes a font descriptor for printed Thai-character recognition. The role of such a descriptor is a representative of various fonts and sizes. The font descriptor construction is based on principal component analysis (PCA) in a combination with predefined patterns in multi-level processing. The proposed font descriptor is tested on Thai character image corpus consisting of consonants, vowels, and tones. The experimental results show that the proposed font descriptor is efficient and robust to font type and size variations.
منابع مشابه
A Modfied Self-organizing Map Neural Network to Recognize Multi-font Printed Persian Numerals (RESEARCH NOTE)
This paper proposes a new method to distinguish the printed digits, regardless of font and size, using neural networks.Unlike our proposed method, existing neural network based techniques are only able to recognize the trained fonts. These methods need a large database containing digits in various fonts. New fonts are often introduced to the public, which may not be truly recognized by the Opti...
متن کاملCryptogram Decoding for Optical Character Recognition
Optical character recognition (OCR) systems for machine-printed documents typically require large numbers of font styles and character models to work well. When given a document printed in an unseen font, the performance of those systems degrade even in the absence of noise. In this paper, we perform OCR in an unsupervised fashion without using any character models by using a cryptogram decodin...
متن کاملMulti-font Optical Character Recognition System for Printed Telugu Text
The Telugu OCR systems available in the market currently recognize only the specific fonts of Telugu. This paper describes the development of a multi-font OCR system for printed Telugu characters using Artificial Neural Networks. In this system classification of the characters is carried out using multi layer neural network Architecture.
متن کاملA Prototype of Multi-Font Printed Chinese Character Reader
An approach to multi-font printed Chinese character recognition is proposed in this paper. The problems of inputting image of characters, preprocessing, character segmentati~n~feature extraction as well as character classification have been discussed. According to the characteristics of multi-font printed Chinese characters,the number of cutting across strokes, the external and internal areas w...
متن کاملMulti-feature Extraction for Printed Thai Character Recognition
This paper presents a simplified printed Thai character recognition system using multiple feature extraction and character classification. Three relevant information extracted from a set of training character images are the direction of each character’s contour, the density of character body and character peripheral information. This set of features is used as reference for classifying unknown ...
متن کامل